[V2] Changes to language definition #1516

teofr · 2026-01-28T10:12:21Z

Changes to V2 language definition, the main reason is to facilitate creating an LR(1) parser. The more complex ones are:

Changes to TupleDeconstructionStatement, making it more strict and with a clear separation between var style declarations and explicit ones.
Changes to the IdentifierPath, due to making address a reserved keyword.

For another PR/discussion, we considered merging TupleDeconstructionStatement and VariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different. We can still force it if we consider there's value in it, but I think not worth it for now; they'll probably be joined in one of the passes simplifying the ast.

changeset-bot · 2026-01-28T10:12:25Z

⚠️ No Changeset found

Latest commit: c14dcbf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

OmarTawfik

Left a few questions/suggestions. Thanks!

OmarTawfik · 2026-01-30T14:50:03Z

crates/solidity-v2/inputs/language/src/BREAKING_CHANGES.md

 - `HexLiteral` and `YulHexLiteral` and `DecimalLiteral` and `YulDecimalLiteral`:
    - It was illegal for them to be followed by `IdentifierStart`. Now we will produce two separate tokens rather than rejecting it.
+
+## Language Definition Changes


I suggest renaming this section to Grammar, since the rest of the doc also lists language definition changes:

## Grammar

OmarTawfik · 2026-01-30T14:54:16Z

crates/solidity-v2/inputs/language/src/BREAKING_CHANGES.md

+
+### IdentifierPath
+
+Changed from a simple `Separated` list to a structured format to allow the reserved `address` keyword to appear in identifier paths (but not as the head):


I wonder if we are able to just use Separated(MemberAccessIdentifier, Period) for simplicity? We won't need the extra type, given how commmon IdentifierPath is used.

Also, WDYT of IdentifierPathElement instead of MemberAccessIdentifier? the latter conflicts with the fact that one of its two variants is no longer an identifier.

Yeah, that'd be cleaner. Potentially that could introduce some ambiguity, but probably solvable by writing the parser rule by hand. I'll go down that way

OmarTawfik · 2026-01-30T14:59:27Z

crates/solidity-v2/inputs/language/src/BREAKING_CHANGES.md

+The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a
+an `AssignmentExpression` with a `TupleExpression` on the lhs.


can still be a TupleDeconstructionStatement or a an AssignmentExpression

Which one? I wonder if we have existing cst_output tests for this case?

Also, how about var (,,,,)? this is legal AFAICT.

Which one?

This is a tricky question since there's still no parser on V2, I'll try to answer it:

Right now the v2 language definition (with these changes) is still ambiguous, so in theory parsing either one of those options is correct.

If we choose to do like the V1 and give priority to definitions higher up the definition.rs file, then they'd be parsed as TupleDeconstructionStatement

Once the V2 parser is done, we need to handle this ambiguity and choose one, I'd go for those cases being an AssignmentExpression, since they're not declaring a variable at all.

Right now there's no way to express "a separated item, where every item is optional but has to appear at least once" in the language definition DSL, so it makes it difficult to express this within the language DSL

We could separate it into a prefix of empty tuple (ie (,,,,), then an element that must be there (ie bool a), and then a postfix of possible empty tuple elements (, bool b, , ,)); but that would make a parsing problem make the CST and general API worse.

Also, I just checked solc doesn't seem to allow empty tuples at all (ie (,,,) = ... or () = ...) after 0.5.0, so maybe we need to validate this after parsing.

I wonder if we have existing cst_output tests for this case?

We have some cases, I added a few more (they're only testing V1 for now)

Also, how about var (,,,,)? this is legal AFAICT.

This is legal with the new definition as well, since the elements of the Separated (UntypedTupleDeconstructionElement) have an optional identifier as its field:

Struct( name = UntypedTupleDeconstructionElement, enabled = Till("0.5.0"), fields = (name = Optional(reference = Identifier)) ),

OmarTawfik · 2026-01-30T15:19:33Z

crates/solidity-v2/inputs/language/src/BREAKING_CHANGES.md

+
+This makes certain cases that were allowed before disallowed in V2, in particular having untyped declarations (like `(a, bool b) = ...`)
+or having typed together with `var` (like `var (a, bool b) = ...`).
+The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a


For another PR/discussion, we considered merging TupleDeconstructionStatement and VariableDeclaration, to merge all variable declarations together, however I think this will look a bit artificial since their shape is quite different.

I think having this distinction (declaring new vars VS assigning to existing ones) is worth the destinction, both syntactically and semantically. WDYT of having the following structure, if it works with LALR?

use VariableDeclarationStatement for any syntax that declares a new name:

var x = ... already supported

int x = ... already supported

change VariableDeclarationStatement::name field to an enum with two variants:

name: Identifier -> existing

elements -> a struct holding LeftParen + Separated(elements) + RightParen

use AssignmentExpression for any syntax that just assigns values to the LHS:

x = ....

(x, y) = ....

(,,,) = ....

I think having this distinction (...) is worth the destinction

I completely agree

The problem I see with the proposed structure is that currently the int and the var in the first 2 cases are captured by the same definition, so you could end up with a language accepting something like int (a, b, c) = ....

But also, since you want to allow for the elements of the tuple to have the type within it, you'd maybe want to make the var/int struct optional, to allow (bool a, uint b) = ..., but that would also parse x = ... as valid.

Coming from a perspective of appeasing LALRPOP, I'd say the distinction has to be a bit stronger, so allowing VariableDeclarationStatement to be an enum over:

SingleExplicitDeclaration: int x = ...

MultiExplicitDeclaration: (bool a, , int b) = ...

ImplicitDeclaration (until 0.5.0): var a = ... and var (a, , b) = ... (so this one would have the enum allowing either a single Identifier or a tuple of Identifier)

This could be simplified when lowering it to an AST

What do you think? I'll try to push a single commit with these changes so you can review them.

teofr requested review from a team as code owners January 28, 2026 10:12

OmarTawfik requested changes Jan 30, 2026

View reviewed changes

teofr force-pushed the teofr/node_checker branch from d95ed0d to db16409 Compare February 2, 2026 15:57

teofr added 3 commits February 2, 2026 16:52

Changes to V2 language definition

2f51354

Fix lints

432f2dd

Addressed (some) comments

c14dcbf

teofr force-pushed the teofr/v2-definition-changes branch from fb9d550 to c14dcbf Compare February 2, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V2] Changes to language definition #1516

[V2] Changes to language definition #1516

Uh oh!

teofr commented Jan 28, 2026

Uh oh!

changeset-bot bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

OmarTawfik left a comment

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

teofr Feb 2, 2026

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

teofr Feb 2, 2026

Uh oh!

OmarTawfik Jan 30, 2026

Uh oh!

teofr Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### IdentifierPath

		Changed from a simple `Separated` list to a structured format to allow the reserved `address` keyword to appear in identifier paths (but not as the head):

		The cases where using empty tuples are still ambiguous, `(,,,) = ...` can still be a `TupleDeconstructionStatement` or a
		an `AssignmentExpression` with a `TupleExpression` on the lhs.

[V2] Changes to language definition #1516

Are you sure you want to change the base?

[V2] Changes to language definition #1516

Uh oh!

Conversation

teofr commented Jan 28, 2026

Uh oh!

changeset-bot bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

OmarTawfik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Jan 28, 2026 •

edited

Loading